A Survey of Distributed Fault Tolerance Strategies
نویسنده
چکیده
Grid computing is defined as geographically distributed, heterogeneity (different hardware, software and networks), resource sharing, multiple administrators, dependable access, and Pervasive access within dynamic organizations. In grid computing, the rate of failure is much greater than in traditional parallel computing. Therefore, the fault tolerance is an important property in order to achieve reliability, availability and QOS. In this paper, we give a survey on various fault tolerance techniques and fault management in different situations with related issues. The fault tolerance service deals with various types of resource failures. This survey provides the related research results about fault tolerance in grid infrastructure and also the future directions about fault tolerance techniques, and this survey attempts to provide guide for researcher.
منابع مشابه
Fault Tolerance in Wireless Sensor Networks using Stiffed Delaunay Triangulation
Abstract: In this chapter, we introduce fault tolerance in Wireless Sensor Networks. Firstly, we start with a short description of sensor networks, fault tolerance and its different techniques. Then we discuss the different phases of fault tolerance (fault models, fault detection and identification at five levels of abstractions (physical, hardware, middleware, system software and applications)...
متن کاملA Detailed Review of Fault-Tolerance Techniques in Distributed System
In this paper, we give a survey on various fault tolerance techniques and related issues in distributed systems. More specially speaking, we talk about two most important issues; multiple fault handling capability and performance. This survey provides the related research results and also explored the future directions about fault tolerance techniques, and it is a good reference for researcher.
متن کاملA Survey of Secure, Fault-tolerant Distributed File Systems
We survey four secure fault-tolerance distributed file systems: Farsite, OceanStore, Ivy, and Frangipani. We analyze each with respect to fault-tolerance, scalability, usability, maintenance overhead, and consistency. Finally, we present a taxonomy for such file systems based upon their failure models, update mechanisms, and data location schemes.
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملA Survey: Load Balancing for Distributed File System
Distributed Systems are useful for computation and storage of large scale data at dispersed location. Distributed File System (DFS) is a subsystem of Distributed System. DFS is a means of sharing of storage space and data. Servers, Storage devices and Clients are on dispersed location in DFS. Fault tolerance and Scalability are two main features of distributed file system. Performance of DFS is...
متن کامل